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Abstract 

In  this  paper  we  give  a  general  definition  of  residuals  for  regression  models 
with  independent  responses.  Our  definition  produces  residuals  which  are  exactly 
normal,  apart  from  sampling  variability  in  the  estimated  parameters,  by  inverting 
the  fitted  distribution  function  for  each  response  value  and  finding  the  equivalent 
standard  normal  quantile.  Our  definition  includes  some  randomization  to  achieve 
continuous  residuals  when  the  response  variable  is  discrete.  Quantile  residuals  are 
ecisily  computed  in  computer  packages  such  as  SAS,  S-Plus,  GLIM  or  LispStat,  and 
allow  residual  analyses  to  be  carried  out  in  many  commonly  occurring  situations  in 
which  the  customary  definitions  of  residuals  fail.  Quantile,  residuals  are  applied  in 
this  paper  to  three  example  data  sets. 

Keywords:  generalized  linear  model;  deviance  residual;  Pearson  residual;  exponen¬ 
tial  regression;  logistic  regression;  Poisson  regression;  normal  probability  plot. 


1  Introduction 


Residuals,  and  especially  plots  of  residuals,  play  a  central  role  in  the  checking  of  statistical 
models.  In  normal  linear  regression  the  residuals  are  normally  distributed  and  can  be 
standardized  to  have  equal  variances.  In  non- normal  regression  situations,  such  as  logistic 
regression  or  log-linear  analysis,  the  residuals,  as  usually  defined,  may  be  so  far  from 
normality  and  from  having  equal  variances  as  to  be  of  no  practical  use.  A  particular 
problem  occurs  when  the  response  variable  is  discrete  and  takes  on  a  small  number  of 
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distinct  values,  as  for  Poisson  data  with  mean  not  far  from  zero  or  binomial  data  with 
mean  close  to  either  zero  or  the  number  of  trials.  In  such  situations  the  residuals  lie 
on  parallel  curves  corresponding  to  distinct  response  values,  and  these  spurious  curves 
distract  the  eye  seriously  from  any  meaningful  message  that  might  be  contained  in  a 
residual  plot. 

In  this  paper  we  give  a  general  definition  of  residuals  for  regression  models  with  in¬ 
dependent  responses.  Our  definition  produces  residuals  which  are  exactly  normal,  apart 
from  sampling  variability  in  the  estimated  parameters,  by  inverting  the  fitted  distribution 
function  for  each  response  value  and  finding  the  equivalent  standard  normal  quantile. 
This  approach  is  closely  related  to  that  of  Cox  and  Snell  (1968),  but  whereas  Cox  and 
Snell  concentrate  on  mean  and  variance  corrections  we  concentrate  on  the  transformation 
to  normality.  Our  definition  includes  some  randomization  to  achieve  continuous  residuals 
when  the  response  variable  is  discrete.  Quantile  residuals  are  easily  computed  in  com¬ 
puter  packages  such  as  SAS,  S-Plus,  GLIM  or  LispStat,  and  allow  residual  analyses  to  be 
carried  out  in  many  commonly  occurring  situations  in  which  the  customary  definitions  of 
residuals  fail. 

For  other  work  on  residuals  for  non-normal  regression  models  see  Pierce  and  Schafer 
(1986)  or  McCullagh  and  Nelder  (1989)  and  the  references  therein.  In  the  discussion  at 
the  end  of  the  paper  we  briefly  indicate  how  quantile  residuals  may  be  extended  to  models 
with  dependent  responses. 

2  Pearson  and  Deviance  Residuals 

Let  be  responses  and  for  each  i  let  x,  be  a  vector  of  covariates.  The  y,  are 

assumed  to  be  independent  and  to  follow  a  distribution  where  fii  =  E{yi)  and 

^  is  a  parameter  vector  common  to  all  the  y,.  The  are  assumed  to  depend  on  the  x,- 
and  a  vector  of  regression  parameters  We  have  particularly  in  mind  generalized  linear 
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models  (McCullagh  and  Nelder,  1989)  in  which  the  probability  density  or  mass  function 
of  has  the  form 

f{y\&i,<f>)  =  a(i/,0)exp[{?/0,-  -  k(^,)}/<^] 

where  a()  and  k()  are  known  functions  and  fii  =  In  this  model  we  have  var(y,)  = 

0V{ni)  where  V{fi,)  =  It  is  customary  to  assume  that  g{fii)  =  x^/3  where 

is  a  known  link  function.  The  parameter  (j)  is  the  proportionality  constant  in  the  mean- 
variance  relationship  and  is  known  as  the  dispersion  parameter. 

In  the  context  of  generalized  linear  models,  two  definitions  of  residuals  have  been 
commonly  used  in  practice.  The  Pearson  residual  is  defined  by 

where  /I,  is  the  fitted  value  for  //,•.  The  Pearson  residual  has  the  advantage  that  its  mean 
and  variance  are  exactly  zero  and  <f>  respectively,  if  sampling  variability  in  fii  is  small. 
The  deviance  residuals  are  defined  in  terms  of  the  unit  deviances.  For  the  above  model, 
let  t{y,y,)  =  yO  —  k{9).  Assuming  that  y  is  in  the  domain  of  //,  the  unit  deviance  is 

d{y,fi)  =2{t{y,y)-t{y,fj.)} 

The  deviance  residual  is 

rd,i  =  d{yi,  /i,)^/^sign(y,-  -  fn) 

Pierce  and  Schafer  (1986)  have  argued  on  theoretical  grounds  that  the  deviance  residuals 
should  be  more  nearly  normal  than  the  Pearson.  Indeed  both  converge  to  normality  as 
4>  0  relative  to  the  /i,-,  the  Pearson  residuals  at  rate  by  the  Central  Limit  The¬ 

orem  and  the  deviance  residuals  at  0(</>)  by  the  saddle-point  approximation  to  /(y;  9,,  <f>). 
The  Pearson  and  deviance  residuals  coincide  and  are  exax:tly  normal,  ignoring  variability 
in  /i,,  for  the  normal  linear  model.  The  deviance  residual  is  also  exactly  normal  when  the 
response  is  inverse-Gaussian.  In  other  cases  and  for  large  <f>lmu  however,  neither  type 
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of  residual  can  be  guaranteed  to  be  closely  normal,  and  the  deviance  residuals  do  not 
generally  have  zero  means  or  equal  variances  even  at  the  true  values  Hi. 


3  Randomized  Quantile  Residuals 


Let  F{y\  (i))  be  the  cumulative  distribution  function  of  V{}ii,  (p).  If  F  is  continuous,  the 
quantile  residuals  are  defined  by 

where  <I>()  is  the  cumulative  distribution  function  of  the  standard  normal.  Apart  sampling 
variability  in  fii  and  (^,  the  Tg,,-  are  exactly  standard  normal.  This  is  implies  that  the 
distribution  of  r,,,  converges  to  standard  normal  if  (3  and  (p  are  consistently  estimated. 
The  above  definition  is  a  special  case  of  Cox  and  Snell’s  (1968)  “crude”  residuals. 


Example  1:  Leukemia  data.  Feigl  and  Zelen  (1965)  discuss  some  data  relating  the  survival 
times  y,  of  leukemia  patients  to  their  initial  white  blood  cell  counts  x,-  and  to  existence 
of  AG-factor.  Following  Feigl  and  Zelen,  we  treat  the  survival  times  as  exponential,  y,  ~ 
Exp(/i,).  We  work  with  a  log-linear  model  for  the  means,  including  separate  intercepts 
for  the  two  AG-factor  groups. 


f  ai  +  log  X,  AG  positive 
\  a2  +  Id  log  Xi  AG  negative 


Cox  and  Snell  (1968)  considered  a  subset  of  this  data,  and  defined  approximately  expo¬ 
nential  crude  residuals  Ri  =  Vi/ fit ^  where  the  fii  are  the  estimated  means.  In  this  case  the 
quantile  residuals 

r<},i  =  -exp(y.//i,)} 


are  a  simple  transformation  of  the  Ri.  A  normal  probability  plot  of  the  quantile  residu¬ 
als  confirms  the  assumption  of  an  exponential  distribution.  Figure  1  plots  the  quantile 
residuals  versus  the  covariate.  The  three  residuals  (cases  17,  31  and  33)  in  the  upper 
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Figure  1:  Plot  of  quantile  residuals  versus  the  covariate  for  the  leukemia  data.  Circles 
represent  patients  which  are  AG-positive,  crosses  AG-negative. 
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right-hand  corner  of  the  plot  are  relatively  separate  from  the  body  of  the  other  residuals, 
and  without  them  there  appears  to  be  a  marked  negative  trend.  Cases  17,  31  and  33 
may  be  outliers,  or  it  may  be  that  the  dispersion  of  the  residuals  increases  at  the  largest 
white  blood  cell  counts.  In  any  Ccise,  the  three  ceises  identified  appear  from  the  residual 
plot  to  be  jointly  influential.  Assigning  the  identified  cases  zero  weight  increases  ^  nearly 
three- fold,  from  -0.30  to  -0.84  compared  with  a  standard  error  of  0.14. 

If  F  is  not  continuous,  a  more  general  definition  of  quantile  residuals  is  required.  Let 
ai  =  lim^Tvi  and  6,-  =  We  define  the  randomized  quantile  residual 

for  yi  hy 

r,.,-  = 

where  Uj  is  a  uniform  random  variable  on  the  interval  (aj,6,].  Again,  the  are  exactly 
standard  normal,  apart  sampling  variability  in  /t,-  and  The  randomization  strategy  em¬ 
ployed  here  is  similar  to  the  strategy  of  jittering  (Chambers  et  al,  1983)  to  prevent  masses 
of  overlapping  points  in  plots.  Whereas  jittering  applies  a  uniform  random  component  to 
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Figure  2:  Deviance  and  quantile  residuals  versus  the  covariate  from  a  logistic  regression. 
The  response  is  simulated  bin(3,p)  with  logit  p  depending  quadratically  on  the  covariate. 
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the  response,  our  uniform  random  component  is  on  the  cumulative  probability  scale  and  is 
tailored  to  the  actual  probability  mass  at  the  point  in  question.  Our  randomization  is  the 
minimum  necessary  so  that  no  granularity  remains  in  the  resulting  residual  distribution. 

Example  2:  Simulated  binomial  data.  A  logistic  linear  regression  was  used  to  model  60 
binomial  observations  with  binomial  denominator  n  =  3,  i.e.,  the  responses  were  assumed 
to  be  independently  distributed  as  y,-  ~  bin(n,p,),  with  n  =  3  and  logit(p,)  =  So  +  Si^i 
were  x,  is  a  covariate.  The  first  plot  of  Figure  2  displays  the  deviance  residuals  versus 
the  covariate.  The  points  in  this  plot  lie  on  four  parallel  curves  corresponding  to  the  four 
possible  value  for  the  response.  The  curves  make  it  difficult  to  see  any  other  pattern  in 
the  data.  The  second  plot  displays  the  quantile  residuals  versus  the  covariate.  In  this 
plot  is  clear  that  the  residuals  follow  a  quadratic  pattern.  The  data  for  this  example  was 
in  fact  computer  generated  with  logit(p,)  depending  quadratically  on  the  x,. 

Example  3:  Fathers’  and  sons’  occupations.  Brown  (1974)  and  Kotze  and  Hawkins 
(1984)  analyze  a  sparse  14  x  14  contingency  table  showing  the  cross-classification  of  oc¬ 
cupations  of  fathers  (rows)  by  occupations  of  sons  (columns).  The  data  was  originally 
published  by  Pearson  (1904)  and  appears  also  in  Hand  et  al  (1994).  Brown,  Kotze  and 


6 


Figure  3i  Normail  probability  plot  of  the  quantile  residuals  from  the  fathers  and  sons 
occupation  data. 


Normal  Probability  Plot 


Hawkins  were  interested  in  identifying  those  cells  which  are  outliers  relative  to  the  inde¬ 
pendence  model.  We  take  a  similar  approach,  with  the  dilference  that  the  quantile  residual 
approach  allows  us  to  look  for  outliers  relative  to  a  more  realistic  model.  Observing  that 
there  is  an  apriori  expectation  that  sons  will  be  influenced  by  their  father’s  occupation, 
we  fit  a  log-linear  Poisson  regression  model  to  the  counts  with  row  and  column  effects  and 
with  an  effect  for  equality  of  father’s  and  son’s  occupation,  i.e.,  yij  Pois(/x»y),  with 

log  /itj  =  Mo  +  oti  +  f^j  +  ^ ) 

and  Xij  =  1  if  i  =  j  and  0  otherwise.  Figure  3  is  a  normal  probability  plot  of  quantile 
residuals  from  this  model.  The  largest  positive  residual  corresponds  to  the  (2,2)  cell:  sons 
almost  always  continue  to  work  in  the  Arts  if  their  father  did.  Figure  3  shows  evidence  of 
large  negative  residuals  cts  well  as  laxge  positive  residuals.  Although  none  of  the  negative 
residuals  are  individually  significant,  and  the  actual  contingency  table  cells  represented  in 
the  left  tail  of  the  probability  plot  varies  with  each  realization  of  the  quantile  residuals,  the 
overall  pattern  is  preserved  across  realizations.  The  quantile  residual  plot  shows  in  this 


7 


way  that  there  are  too  many  zero  counts  in  the  contingency  table  to  be  compatible  with 
the  above  model.  No  other  method  which  has  been  applied  to  this  data  in  the  literature 
is  able  to  show  this  aspect  of  the  data.  Although  Figure  3  shows  clear  evidence  of  lack 
of  fit,  the  model  (1)  and  the  models  which  arise  from  it  by  deleting  selected  cells  does 
give  an  appreciatably  better  fit  to  this  data  than  the  independence  models  considered  by 
earlier  authors. 

4  Discussion  and  Extensions 

In  this  paper  quantile  residuals  are  computed  by  finding  the  equivalent  standard  normal 
deviate  for  each  response  observation.  Any  reference  distribution  could  have  been  chosen 
for  the  residuals,  but  the  normal  seems  to  be  the  easiest  to  interpret  for  graphical  purposes. 

Randomization  is  used  to  produce  continuously  distributed  residuals  when  the  re¬ 
sponse  is  discrete  or  has  a  discrete  component.  This  means  that  the  quantile  residuals 
will  vary  from  one  realization  to  another  for  a  given  data  set  and  fitted  model.  For  the 
sake  of  brevity,  we  have  given  only  one  realization  of  the  quantile  residuals  for  each  ex¬ 
ample  in  this  paper.  In  practice  though  we  have  found  it  useful  to  routinely  plot  four 
realizations  of  the  quantile  residuals.  Any  pattern  in  the  residuals  which  is  not  consistent 
across  the  realizations  is  then  ignored. 

Quantile  residuals  provide  a  logical  approach  to  added  variable  plots  (Cook  and  Weis- 
berg,  1982)  in  generalized  linear  models.  An  added  variable  plot  for  a  variable  x  would 
consist  of  plotting  the  quantile  residuals,  for  the  model  excluding  x,  versus  Xo,  where  Xa 
is  X  adjusted  for  the  other  covariates  in  the  model.  The  vector  Xa  would  be  chosen  to  be 
orthogonal  to  the  other  covariates,  relative  to  the  covariance  matrix  of  the  y,.  It  might 
be  computed  as  the  residuals  from  weighted  least  squares  regression  of  x  on  the  other 
covariates,  using  as  weights  the  working  weights  from  the  generalized  linear  model. 

Independence  of  the  response  observations  was  assumed  in  this  paper.  The  method 
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of  quantile  residuals  can  be  extended  to  dependent  data  situations  by  expressing  the 
multivariate  likelihood  as  a  sum  of  univariate  conditional  likelihoods.  For  example  we 
might  define  the  ith  conditional  quantile  residual  from  the  conditional  distribution  of  yi 

given  j/i, _ y-n  instead  of  from  the  marginal  distribution  of  y,  as  in  the  paper.  This  would 

provide  independent,  standard  normal  residuals. 

Finally  we  consider  the  sampling  variability  of  the  /t,,  which  has  for  simplicity  been 
ignored  throughout  this  paper.  Treating  the  in  as  fixed  is  appropriate  when  good  in¬ 
formation  is  available  on  the  model  parameters,  but  may  be  unrealistic  for  example  for 
designed  experiments  in  which  the  number  of  parameters  is  not  small  compared  to  the 
number  of  observations.  In  normal  linear  models,  REML  estimate  of  the  variance  struc¬ 
ture  is  obtained  from  the  marginal  distribution  of  any  set  of  zero  mean  contrasts,  Z^y 
say.  In  a  similar  way,  independent  and  identically  distributed  residuals  could  be  obtained 
by  transforming  from  the  y,  to  any  orthonormal  set  of  zero  mean  constrasts. 

Extending  this  idea  to  non-normal  regression  is  more  difficult,  but  could  in  principle 
be  done  using  the  conditional  approach  of  Smyth  and  Verbyla  (1995).  Smyth  and  Verbyla 
(1995)  have  argued  that  REML  estimation  for  generalized  linear  models  should  proceed  by 
considering  the  conditional  distribution  of  the  y,-  given  Independent  quantile  residuals 
could  therefore  be  defined  by  considering  the  conditional  distribution  of  each  yi  given 
yi , . . . ,  yi_i  and  /3.  For  certain  values  of  i  this  distribution  would  be  degenerate;  these 
values  could  be  ignored  without  loss  of  information. 
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